home *** CD-ROM | disk | FTP | other *** search
- Path: po.CWRU.Edu!mab22
- From: mab22@po.CWRU.Edu (Michael A. Balfour)
- Newsgroups: comp.lang.c
- Subject: Re: unique id for a string
- Date: 22 Feb 1996 19:47:22 GMT
- Organization: Case Western Reserve University, Cleveland, OH (USA)
- Message-ID: <4gih8a$b62@madeline.INS.CWRU.Edu>
- References: <1996Feb16.175601.114182@kuhub.cc.ukans.edu> <3129dd39.961626@news.iquest.net> <4gfpveINNe68@keats.ugrad.cs.ubc.ca>
- Reply-To: mab22@po.CWRU.Edu (Michael A. Balfour)
- NNTP-Posting-Host: kanga.ins.cwru.edu
-
-
- In a previous article, c2a192@ugrad.cs.ubc.ca (Kazimir Kylheku) says:
-
- >In article <3129dd39.961626@news.iquest.net>,
- >Robert B. Clark <rclark@iquest.net> wrote:
- > >On 16 Feb 96 17:56:01 CST, anh@kuhub.cc.ukans.edu wrote:
- > >
- > >>HELLO = H * 26^4 + E * 26 ^3 + L * 26^2 + L * 26^1 + E * 26^0
- > >>
- > >>But the id is way too big. I need something that fits within a 32-bits int
- > >>or a 64-bits long.
- > >
- > >You're thinking way too hard, perhaps. Look up the algorithm for CRCs
- > >(cyclic redundancy checks/codes) in Snippets or most any other popular
- > >programmer's reference.
- >
- >But he wants a _unique_ ID. This is not possible, since the space of possible
- >words is much larger than the space of 32- or even 64-bit integers. Even for a
- >limited dictionary of around 200,000 words, a function that _guarantees_ a
- >unique hash for each indentifier into a 32-bit space is difficult to find.
-
- Actually, it's not *that* difficult to find. Try running gperf from GNU
- or reading the paper cited in my previous post in this subject. The
- results in the paper claim to be able to find a perfect hash function
- for 262,144 18-character words in 16.087 seconds on a Sun Sparc 2.
- Doesn't sound that difficult to me. :)
-
- >One way to get a unique mapping is to simply assign increasing numbers to the
- >200,000+ words, and use a hashing technique to quickly look up a word given its
- >integer tag, and look up the tag given the word. With the proper
- >represenatation, both operations can be done in "constant time".
-
- Once again, using a perfect minimal hash function will provide an easy
- method for accomplishing this.
-
- Mike Balfour
- --
- ----------------------------------+--------------------------------
- Mike Balfour, Partner | BS/MS Graduate - ECMP
- Overload Engineering | Case Western Reserve University
- "New Ideas for a Brighter Future" | Cleveland, OH
-